Anomaly Detection in Dataset for Improved Model Accuracy Using DBSCAN Clustering Algorithm

نویسنده

  • A. R. Ajiboye
چکیده

The purity of the dataset used for model construction plays important roles in the accuracy and reliability of model building; outliers are often caused by noisy data as a result of mechanical faults, changes in system behaviour, or due to human error. This is why it is essential to pre-process dataset prior to modelling, in order to differentiate between data that appears normal or abnormal within the sample space. One important reason for removing outliers is to prevent contaminating effect on the dataset which can lead to bad consequences and serious disaster if not removed. An effective measure that automatically clusters outliers in the dataset using Density-Based Spatial Clustering of Applications with Noise (DBSCAN) technique is proposed in this paper. Rapidminer, an open source software tool is used to experiment on some sample dataset and based on the characteristics of these data objects, some clusters are formed which filter out outliers from the dataset being explored. The experimental results from this study show that, the DBSCAN algorithm is a suitable technique for outliers detection and capable of filtering the abnormal data from a combination of noise and normal dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement of density-based clustering algorithm using modifying the density definitions and input parameter

Clustering is one of the main tasks in data mining, which means grouping similar samples. In general, there is a wide variety of clustering algorithms. One of these categories is density-based clustering. Various algorithms have been proposed for this method; one of the most widely used algorithms called DBSCAN. DBSCAN can identify clusters of different shapes in the dataset and automatically i...

متن کامل

A Novel Classification via Clustering Method for Anomaly Based Network Intrusion Detection System

Intrusion detection in the internet is an active area of research. Intruders can be classified into two types, namely; external intruders who are unauthorized users of the computers they attack, and internal intruders, who have permission to access the system but with some restrictions. The aim of this paper is to present a methodology to recognize attacks during the normal activities in a syst...

متن کامل

A Hybrid Framework for Building an Efficient Incremental Intrusion Detection System

In this paper, a boosting-based incremental hybrid intrusion detection system is introduced. This system combines incremental misuse detection and incremental anomaly detection. We use boosting ensemble of weak classifiers to implement misuse intrusion detection system. It can identify new classes types of intrusions that do not exist in the training dataset for incremental misuse detection. As...

متن کامل

Unsupervised Video Surveillance for Anomaly Detection of Street Traffic

Intelligent transportation systems enables the analysis of large multidimensional street traffic data to detect pattern and anomaly, which otherwise is a difficult task. Advancement in computer vision makes great contribution in the progress of video based traffic surveillance system. But still there are some challenges which need to be solved like objects occlusion, behavior of objects. This p...

متن کامل

An Improved K-Means with Artificial Bee Colony Algorithm for Clustering Crimes

Crime detection is one of the major issues in the field of criminology. In fact, criminology includes knowing the details of a crime and its intangible relations with the offender. In spite of the enormous amount of data on offenses and offenders, and the complex and intangible semantic relationships between this information, criminology has become one of the most important areas in the field o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015